# FP8 Quantized Inference
Qwen3 32B FP8
Apache-2.0
Qwen3-32B-FP8 is the latest 32.8B-parameter large language model in the Qwen series, supporting switching between thinking and non-thinking modes with exceptional reasoning, instruction following, and agent capabilities.
Large Language Model
Transformers

Q
Qwen
29.26k
47
Qwen3 8B FP8
Apache-2.0
Qwen3-8B-FP8 is the latest version in the Qwen series of large language models, offering FP8 quantization, seamless switching between thinking and non-thinking modes, and powerful reasoning capabilities with multilingual support.
Large Language Model
Transformers

Q
Qwen
22.18k
27
Qwen2.5 VL 72B Instruct FP8 Dynamic
Apache-2.0
FP8 quantized version of Qwen2.5-VL-72B-Instruct, supporting vision-text input and text output, optimized and released by Neural Magic.
Image-to-Text
Transformers English

Q
parasail-ai
78
1
Llama 3.1 8B Instruct FP8
FP8 quantized version of Meta Llama 3.1 8B Instruct model, featuring an optimized transformer architecture autoregressive language model with 128K context length support.
Large Language Model
Transformers

L
nvidia
3,700
21
Featured Recommended AI Models